Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
                                            Some full text articles may not yet be available without a charge during the embargo (administrative interval).
                                        
                                        
                                        
                                            
                                                
                                             What is a DOI Number?
                                        
                                    
                                
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
- 
            Free, publicly-accessible full text available March 1, 2026
- 
            Free, publicly-accessible full text available January 8, 2026
- 
            Dense linear layers are the dominant computational bottleneck in large neural networks, presenting a critical need for more efficient alternatives. Previous efforts focused on a small number of hand-crafted structured matrices and neglected to investigate whether these structures can surpass dense layers in terms of compute-optimal scaling laws when both the model size and training examples are optimally allocated. In this work, we present a unifying framework that enables searching among all linear operators expressible via an Einstein summation. This framework encompasses many previously proposed structures, such as low-rank, Kronecker, Tensor-Train, Block Tensor-Train (BTT), and Monarch, along with many novel structures. To analyze the framework, we develop a taxonomy of all such operators based on their computational and algebraic properties and show that differences in the compute-optimal scaling laws are mostly governed by a small number of variables that we introduce. Namely, a small ω (which measures parameter sharing) and large ψ (which measures the rank) reliably led to better scaling laws. Guided by the insight that full-rank structures that maximize parameters per unit of compute perform the best, we propose BTT-MoE, a novel Mixture-of-Experts (MoE) architecture obtained by sparsifying computation in the BTT structure. In contrast to the standard sparse MoE for each entire feed-forward network, BTT-MoE learns an MoE in every single linear layer of the model, including the projection matrices in the attention blocks. We find BTT-MoE provides a substantial compute-efficiency gain over dense layers and standard MoE.more » « lessFree, publicly-accessible full text available December 10, 2025
- 
            Standard regularized training procedures correspond to maximizing a posterior distribution over parameters, known as maximum a posteriori (MAP) estimation. However, model parameters are of interest only insomuch as they combine with the functional form of a model to provide a function that can make good predictions. Moreover, the most likely parameters under the parameter posterior do not generally correspond to the most likely function induced by the parameter posterior. In fact, we can re-parametrize a model such that any setting of parameters can maximize the parameter posterior. As an alternative, we investigate the benefits and drawbacks of directly estimating the most likely function implied by the model and the data. We show that this procedure leads to pathological solutions when using neural networks and prove conditions under which the procedure is well-behaved, as well as a scalable approximation. Under these conditions, we find that function-space MAP estimation can lead to flatter minima, better generalization, and improved robustness to overfittingmore » « less
- 
            Standard regularized training procedures correspond to maximizing a posterior distribution over parameters, known as maximum a posteriori (MAP) estimation. However, model parameters are of interest only insomuch as they combine with the functional form of a model to provide a function that can make good predictions. Moreover, the most likely parameters under the parameter posterior do not generally correspond to the most likely function induced by the parameter posterior. In fact, we can re-parametrize a model such that any setting of parameters can maximize the parameter posterior. As an alternative, we investigate the benefits and drawbacks of directly estimating the most likely function implied by the model and the data. We show that this procedure leads to pathological solutions when using neural networks and prove conditions under which the procedure is well-behaved, as well as a scalable approximation. Under these conditions, we find that function-space MAP estimation can lead to flatter minima, better generalization, and improved robustness to overfitting.more » « less
- 
            Parameter-space regularization in neural network optimization is a fundamental tool for improving generalization. However, standard parameter-space regularization methods make it challenging to encode explicit preferences about desired predictive functions into neural network training. In this work, we approach regularization in neural networks from a probabilistic perspective and show that by viewing parameter-space regularization as specifying an empirical prior distribution over the model parameters, we can derive a probabilistically well-motivated regularization technique that allows explicitly encoding information about desired predictive functions into neural network training. This method—which we refer to as function-space empirical Bayes (FS-EB)—includes both parameter- and function-space regularization, is mathematically simple, easy to implement, and incurs only minimal computational overhead compared to standard regularization techniques. We evaluate the utility of this regularization technique empirically and demonstrate that the proposed method leads to near-perfect semantic shift detection, highly-calibrated predictive uncertainty estimates, successful task adaption from pre-trained models, and improved generalization under covariate shift.more » « less
- 
            Babaei, V; Skouras, M (Ed.)The drawing process is crucial to understanding the final result of a drawing. There has been a long history of understanding human drawing; what kinds of strokes people use and where they are placed. An area of interest in Artificial Intelligence is developing systems that simulate human behavior in drawing. However, there has been little work done to understand the order of strokes in the drawing process. Without sufficient understanding of natural drawing order, it is difficult to build models that can generate natural drawing processes. In this paper, we present a study comparing multiple types of stroke orders to confirm findings from previous work and demonstrate that multiple orderings of the same set of strokes can be perceived as human-drawn and different stroke order types achieve different perceived naturalness depending on the type of image prompt.more » « less
- 
            Abstract 316L stainless steel (316L SS) is a flagship material for structural applications in corrosive environments, having been extensively studied for decades for its favorable balance between mechanical and corrosion properties. More recently, 316L SS has also proven to have excellent printability when parts are produced with additive manufacturing techniques, notably laser powder bed fusion (LPBF). Because of the harsh thermo-mechanical cycles experienced during rapid solidification and cooling, LPBF processing tends to generate unique microstructures. Strong heterogeneities can be found inside grains, including trapped elements, nano-inclusions, and a high density of dislocations that form the so-called cellular structure. Interestingly, LPBF 316L SS not only exhibits better mechanical properties than its conventionally processed counterpart, but it also usually offers much higher resistance to pitting in chloride solutions. Unfortunately, the complexity of the LPBF microstructures, in addition to process-induced defects, such as porosity and surface roughness, have slowed progress toward linking specific microstructural features to corrosion susceptibility and complicated the development of calibrated simulations of pitting phenomena. The first part of this article is dedicated to an in-depth review of the microstructures found in LPBF 316L SS and their potential effects on the corrosion properties, with an emphasis on pitting resistance. The second part offers a perspective of some relevant modeling techniques available to simulate the corrosion of LPBF 316L SS, including current challenges that should be overcome.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                     Full Text Available
                                                Full Text Available